Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

نویسندگان

  • Flavio Massimiliano Cecchini
  • Christian Biemann
  • Martin Riedl
چکیده

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

The evaluation of several tasks in lexical semantics is often limited by the lack of large numbers of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled data sets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not al...

متن کامل

WoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification

In this demonstration we present WoSIT, an API for Word Sense Induction (WSI) algorithms. The toolkit provides implementations of existing graph-based WSI algorithms, but can also be extended with new algorithms. The main mission of WoSIT is to provide a framework for the extrinsic evaluation of WSI algorithms, also within end-user applications such as Web search result clustering and diversifi...

متن کامل

Graph Based Algorithms for Word Sense Induction and Disambiguation

This paper presents a survey of graph based methods for word sense induction and disambiguation. Many areas of Natural Language Processing like Word Sense Disambiguation (WSD), text summarization, keyword extraction make use of Graph based methods. The very idea behind graph based approach is to formulate the problems in graph setting and apply clustering to obtain a set of clusters (senses). T...

متن کامل

Graph-Based Induction of Word Senses in Croatian

Word sense induction (WSI) seeks to induce senses of words from unannotated corpora. In this paper, we address the WSI task for the Croatian language. We adopt the word clustering approach based on co-occurrence graphs, in which senses are taken to correspond to strongly inter-connected components of co-occurring words. We experiment with a number of graph construction techniques and clustering...

متن کامل

Statistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words

In this paper we investigate whether the task of disambiguating pseudowords (artificial ambiguous words) is comparable to the disambiguation of real ambiguous words. Since the two methods are inherently different, a direct comparison is not possible. An indirect approach is taken where the setup for both systems is as similar as possible, i.e. using the same corpus and settings. The results obt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017